A quantitative model for the dynamics of target recognition and off-target rejection by the CRISPR-Cas Cascade complex

CRISPR-Cas effector complexes recognise nucleic acid targets by base pairing with their crRNA which enables easy re-programming of the target specificity in rapidly emerging genome engineering applications. However, undesired recognition of off-targets, that are only partially complementary to the crRNA, occurs frequently and represents a severe limitation of the technique. Off-targeting lacks comprehensive quantitative understanding and prediction. Here, we present a detailed analysis of the target recognition dynamics by the Cascade surveillance complex on a set of mismatched DNA targets using single-molecule supercoiling experiments. We demonstrate that the observed dynamics can be quantitatively modelled as a random walk over the length of the crRNA-DNA hybrid using a minimal set of parameters. The model accurately describes the recognition of targets with single and double mutations providing an important basis for quantitative off-target predictions. Importantly the model intrinsically accounts for observed bias regarding the position and the proximity between mutations and reveals that the seed length for the initiation of target recognition is controlled by DNA supercoiling rather than the Cascade structure.


Supplementary Note 1: Calculation of first passage times between different states of a one-dimensional random walk A) First passage time for terminal positions of start and end state
As described in the main text, we consider a one-dimensional (linear) random walk process containing + 1 positions (0,1,2, . . . , ). The model shall be fully parametrized by a set of rate constants +,− describing the transitions at any position to it's the two neighboring states ( Supplementary Fig. 1). Rate constants marked with a '+' or a '-' correspond to forward and backward transitions, respectively (i.e. to transitions with increasing or decreasing position number). We first consider a random walk that starts at position 0 and that terminates at position which would correspond to full R-loop formation. The mean time that the walk needs to arrive for the first time at the last position is called mean first-passage time. To calculate we place a single particle into the system with being the probability to find the particle at a given position . We next introduce a reflecting boundary at the start position 0 (ensuring that there is no escape towards negative positions) and a transmissive boundary at position . When a particle arrives at , this boundary places the particle instantaneously to the start. Thus, the probability to find the particle at position N is zero at all times ( = 0). The forward particle flux between two neighboring positions and + 1 is given by: We furthermore impose steady-state conditions for the system, i.e. the probabilities shall not change over time: From this condition we get that the flux is constant throughout the system, i.e. = 0 for all . For a single particle in the system, the mean first-passage time is then the reciprocal value of the steady-state particle flux 0 . With this the first equation can be transformed to: Combining the upper two equations provides: Supplementary Fig. 1 Scheme of the one-dimensional random walk with start at position 0 and end at position . +,− define the rate constants for the transitions between neighboring positions. To calculate the mean first-passage time for reaching position , a reflective boundary is introduced at position 0 while a transmissive boundary is placed at position . A particle that reaches position is instantaneously placed back to the start.

B) Transition rates for a bidirectional random walk from an internal start to end states at either side
Longer lived R-loop intermediates that form in front of single mismatch positions can either collapse (return to position 0) or expand to form a full R-loop state (reach position ). In the framework of a one-dimensional random walk this can be described by a bidirectional walk that starts at an internal position and that has two possible end states. To calculate mean transition rates to either end state, we place a single particle together with transmissive boundaries at either end into the system and impose steady-state conditions ( Supplementary  Fig. 2). The particle flux from the start position splits into a flux − towards 0 and a flux + towards . Due to steady state conditions (see above), − = . between any two adjacent positions smaller than or equal to and similarly + = . between any two adjacent positions larger than or equal to . With: and 0 = 0 as well as = 0 (due to the transmissive boundaries), we can obtain all ratios all normalized by the respective flux in a recursive manner. For positions larger than we get as before: Provides the ratio between forward and backwards flux, i.e. the ratio that a particle reaches the end at vs. the end at 0. Let = + + − be the total particle flux in the system. We can then express backward and forward flux as: and + = ( + + − ) + + + − = 1 1 + 1/ = + + and − correspond to the unidirectional particle flux at which particles arrive at either end. They correspond thus to the transition rates − and + from position to position 0 and , respectively, as denoted in the equations above. With this we obtain expressions for the particle probabilities normalized by the total flux ⁄ . For ≤ we get: and for ≥ we get: Using the normalization condition that only one particle is in the system at all times we get an expression for the total flux: With this the transition rates + and − can be finally calculated. Please note that this approach allows to calculate any possible rates and probabilities to make a transition to a given state in the system. When considering more intermediate states (e.g. due to more mismatches present), the two end state positions 0 and need to be correspondingly replaced by the positions of the actual states that are adjacent to the start position. Furthermore, the probabilities to make it either towards the end in forward or backward direction are given as: + = + = 1 1 + 1/ and − = − = 1 1 + Setting = 1 allows then also to calculate the probability that following PAM binding a full Rloop is formed, i.e. the target would be recognized. Supplementary Fig. 2 Scheme of a one-dimensional random walk with start at an internal position m and two possible end states at either side. +,− define the rate constants for the transitions between neighboring positions. To calculate the mean rates for reaching either end at positions 0 and , transmissive boundary conditions are placed at position at both ends. A particle that reaches an end is instantaneously placed back to the start. In steady state there are two different particle fluxes: − and + for particles that traveled to 0 and , respectively. Fig. 3 Detailed representation of DNA length, applied turns, force and torque during the different R-loop formation experiments using magnetic tweezers. a Experiment to study the R-loop dynamics on a target that does not lock due to ≥6 bp PAM distal mutations (20 bp for the depicted example or 12 matching bp). The experiment starts with supercoiling the DNA molecule from 0 turns to -3 to -8 negative turns at constant magnet position (i). The molecule length reduces due to the DNA writhe. The magnetic field force is kept constant providing a constant negative torque acting on the DNA that is controlled by the force (ii). Thermally driven R-loop formation and collapse events seen as discrete changes of the DNA length (see two state approximation shown as red line as well as cartoons on top) are followed over a sufficiently long time. Finally, negative DNA supercoiling is removed by turning the magnets to the initial 0 turns leading to a DNA length increase (iii). The light blue trajectory represents the raw DNA length data collected at 120 Hz, the dark blue trajectory the DNA length after sliding-window filtering to 7.5 Hz. Red lines represent the two-state approximation of the trajectory. b Experiment to study the R-loop formation kinetics in case of locking. In order to measure multiple R-loop formation events Cascade needs to be dissociated from the locked state. First, R-loop formation is facilitated by applying negative turns at low force of 0.1 -0.45 pN (transition from i to ii). After observing formation of state (iii) and subsequent locked R-loop formation (seen as a sufficient DNA length increase corresponding to the state iv), the R-loop dissociation is induced by supercoiling the DNA towards positive turns (v) followed by the application of a higher force of ~2.5 pN (vi) that provides an increased positive torque. R-loop dissociation is observed as a discrete DNA length increase (vii). For a new R-loop formation experiment, the force is again lowered and the DNA is supercoiled to the initial negative turns (i and ii). c Example of multiple R-loop formation-dissociation cycles for a substrate containing a mismatch at position 7. Left panel represents the DNA length time trajectory depicted as in (b). Red areas of the trajectory represent R-loop at high force and positive supercoiling as vi in (b). Green areas of the trajectory represent DNA length after the R-loop dissociation as vii in (b). Right panel represents DNA length dependency on magnet rotation. Blue curve shows magnet rotation after R-loop dissociation from positive supercoiling to negative (i in (b)). After R-loop formation (seen as an abrupt jump of DNA length in left panel or transition from ii to iii to iv in (b)) DNA is being supercoiled from negative turns to positive turns and is shown as a green curve (transition from iv to v in (b)). In some rare cases the full R-loop was formed but remained unlocked (depicted as brown area in left panel). In this case the R-loop collapse occurred around zero turns where it could not be seen (red dashed curve in the right panel). These events were not considered in further data analysis. To verify whether the R-loop was locked, we monitored the expected sudden DNA length increase upon R-loop dissociation as well as the shift of the rotation curves at positive turns expected for stable locked R-loops. d, e Enlarged views of the dissociation trajectories from (c). In D R-loop presence is observed as lower magnetic bead position after DNA is positively supercoiled (v in (b)) and as a presence of two distinct states after the increase of force (red and green areas, vi and vii in (b)). In case of R-loop dissociation during transition through 0 turns (red dashed curve in right panel of (c)) magnetic bead does not go as low as in the presence of R-loop (i in (b)) and only one state is observed after the increase of force (brown area in left panel of (c), vii in (b)) as represented in (e).

Supplementary Fig. 4 R-loop dynamics in presence of a single mismatch measured at different concentrations. a
Trajectories and histograms of the DNA length recorded for different applied torques using a target with C:C mismatch at position 17 (light blue). Solid lines in the histograms represent Gaussian fits to the 3 different states, while horizontal dashed lines indicate the average DNA length of each state. Bars represent theoretically predicted occupancies using the parameters shown in Supplementary Table 1. b Single base-pair stepping rates for the different mismatches obtained from the transition rate fits. The dashed line represents the mean rate. Error bars correspond to SD of the fit parameter (67% confidence interval). c Free energy change of R-loop initiation at a Cascade concentration of 170 nM for the different mismatches obtained from the transition rate fits. The dashed line represents the mean. Error bars correspond to SD of the fit parameter (67% confidence interval). d DNA length trajectories and occupancies measured at different Cascade concentrations on a target containing a C:C mismatch at position 14 and 6 PAM-distal mismatches. With increasing concentrations the full R-loop state * becomes increasingly populated. e Intermediate R-loop formation rate 1 as a function of the Cascade concentration at -4.7 pN nm (open circles). The blue line represents a linear fit to the data. Error bars correspond to SEM. f Best fit parameters obtained for the different concentrations from the fits of the transition rates between the different R-loop intermediates (see (g)). Error bars correspond to SD of the fit parameter (67% confidence interval). g Experimental transition rates as a function of torque for the different Cascade concentrations (open circles). Global fits to all rates at the given concentrations are shown as solid lines. Error bars correspond to SEM. Precise sample sizes are given in the Supplementary Table 6. For statistical testing oneway ANOVA was used.

Supplementary Fig. 5 Simulations of R-loop length and magnetic tweezers experiments for single mismatch targets. a, c
Comparison between trajectories of a magnetic tweezers experiment (blue), a random walk simulation of the R-loop length (green) and a Brownian dynamics simulation of a magnetic tweezers experiment (red) for a target containing C:C mismatch (a) and C:A mismatch (c) at the position 17 and 6 terminal mismatches. Light colors show the raw results of the experiments and simulations while dark colors show corresponding 3-state approximations. b, d Enlarged view into the simulated trajectories from areas separated by dashed lines in (a) and (c) correspondingly depicting transitions extracted by the 3-state approximations of the simulated magnetic tweezers trajectories. Dashed boxes depict transitions contributing to 3 and 4 . e Comparison of the extracted rates for measured and simulated trajectories. The measured 3 rate was adjusted by the ratio between measured 4 rates for C:C, C:T and C:A mismatches. Assuming 4 for C:A mismatch is in the same range as for C:C and C:T, C:A values were shifted upward. The same ratio was used to shift 3 values. After the adjustment, determined mismatch penalty values were used to perform Brownian dynamics simulations and to compare rates from the simulation experiment and magnetic tweezers experiment. Error bars in all plots correspond to SEM. Precise sample sizes are given in the Supplementary Table 6. Supplementary Fig. 6 Additional data for locked R-loop formation on targets with single internal mismatches. a Example trajectories for R-loop formation on targets containing a single C:C mismatch at various positions as indicated). Colored sections represent the actual R-loop formation events. The width of a formation event is roughly proportional to the R-loop formation time, such that a visual impression of the R-loop formation time for the different mismatches and torques can be obtained. Light blue trajectories represent raw magnetic tweezers data collected at 120 Hz, dark trajectories represent the data after sliding-window filtering to 7.5 Hz, red lines represent 2-state approximations of the trajectories for WT and M7 targets (the intermediate state is too short-lived to be observed for M7) and 3-state approximations for M17 and M14 targets. b R-loop formation kinetics for the different targets at a torque of -5.2 pN nm (open circles) are represented as normalized event count over time of the event occurrence. Single exponential fits to the data are shown as solid lines. c Comparison of the torque dependence of R-loop formation for the WT target and a target with a PAM mutation at position -1 (see Supplementary Table 3). Data shown in gray is from Fig.  4 (main text). d Torque dependence of the R-loop formation times for the different targets with single internal mismatches plotted with a semi-logarithmic time scale including extrapolation of the fit curves to zero torque. Noticeable, the difference between the different targets with single mismatches vanishes at zero torque, while a strong difference to the WT target persists. e Scheme of the fluorescence bulk solution measurements of the R-loop formation kinetics in absence of supercoiling (zero torque) involving a donor-acceptor dye pair at the PAM distal end of the target. The donor fluorescent signal increases mainly due to R-loop formation and locking (ii, iii) but not initial PAM binding (i). f Kinetics of R-loop formation in absence of supercoiling for WT and mismatched targets. All traces show the average of three replicates.
Not matching